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Amendment to the Specification: 

Please amend the specification as follows. 

Please replace the first paragraph on page 1, with following amended paragraph: 
This application is a divisional of co-pending U. S. Patent Application Serial 
Number 09/656,309, filed September 6, 2000, which is a Continuation-in-Part application of co- 
pending U. S. Patent Application Serial Number 09/391,340, filed September 7, 1999, now U.S. 
Patent No. 6.492.511 B2, which is a divisional of U. S. Patent Application Serial No. 
08/907,166, filed August 6, 1997, now issued as U. S. Patent No. 5,948,666. 

Please replace the paragraph on page 67, lines 9 to 20, with following amended 

paragraph: 

Figure 5 is a flow diagram illustrating one embodiment of an identifier process 
300 for detecting the presence of a feature in a sequence. The process 300 begins at a start state 
302 and then moves to a state 304 wherein a first sequence that is to be checked for features is 
stored to a memory 1 15 in the computer system 100. The process 300 then moves to a state 306 
wherein a database of sequence features is opened. Such a database would include a list of each 
feature's attributes along with the name of the feature. For example, a feature name could be 
"Initiation Codon" and the attribute would be "ATG". Another example would be the feature 
name "TAATAA Box" and the feature attribute would be "TAATAA". An example of such a 
database is produced by the University of Wisconsin Genetics Computer Group 
[[(www.gcg.com)]]. Alternatively, the features may be structural polypeptide motifs such as 
alpha helices, beta sheets, or functional polypeptide motifs such as enzymatic active sites, helix- 
turn-helix motifs or other motifs known to those skilled in the art. 

Please replace the paragraph spanning page 59, line 31 to page 61, line 4, with 
following amended paragraph: 

A "comparison window", as used herein, includes reference to a segment of any 
one of the number of contiguous positions selected from the group consisting of from 20 to 600, 
usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be 
compared to a reference sequence of the same number of contiguous positions after the two 
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sequences are optimally aligned. Methods of alignment of sequence for comparison are well- 
known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the 
local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482, 1981, by the 
homology alignment algorithm of Needleman & Wunsch, J. MoL Biol 48:443, 1970, by the 
search for similarity method of person & Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988, by 
computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in 
the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., 
Madison, WI), or by manual alignment and visual inspection. Other algorithms for determining 
homology or identity include, for example, in addition to a BLAST program (Basic Local 
Alignment Search Tool at the National Center for Biological Information), ALIGN, AMAS 
(Analysis of Multiply Aligned Sequences), AMPS (Protein Multiple Sequence Alignment), 
ASSET (Aligned Segment Statistical Evaluation Tool), BANDS, BESTSCOR, BIOSCAN 
(Biological Sequence Comparative Analysis Node), BLIMPS (BLocks IMProved Searcher), 
FASTA, Intervals & Points, BMB, CLUSTAL V, CLUSTAL W, CONSENSUS, 
LCONSENSUS, WCONSENSUS, Smith- Waterman algorithm, DARWIN, Las Vegas algorithm, 
FNAT (Forced Nucleotide Alignment Tool), Framealign, Framesearch, DYNAMIC, FILTER, 
FSAP (Fristensky Sequence Analysis Package), GAP (Global Alignment Program), GENAL, 
GIBBS, GenQuest, ISSC (Sensitive Sequence Comparison), LALIGN (Local Sequence 
Alignment), LCP (Local Content Program), MACAW (Multiple Alignment Construction & 
Analysis Workbench), MAP (Multiple Alignment Program), MBLKP, MBLKN, PIMA (Pattern- 
Induced Multi-sequence Alignment), SAGA (Sequence Alignment by Genetic Algorithm) and 
WHAT-IF. Such alignment programs can also be used to screen genome databases to identify 
polynucleotide sequences having substantially identical sequences. A number of genome 
databases are available, for example, a substantial portion of the human genome is available as 
part of the Human Genome Sequencing Project (J. Roach, 

http://w e b e r.u. Washington. e du/ - roach/human_ g e nome_ progr e ss 2.html) (Gibbs, 1995) . At 
least twenty-one other genomes have already been sequenced, including, for example, M. 
genitalium (Fraser et al., 1995), M. jannaschii (Bult et al., 1996), H. influenzae (Fleischmann et 
al., 1995), E. coli (Blattner et al, 1997), and yeast (S. cerevisiae) (Mewes et al., 1997), and D. 
melanogaster (Adams et al, 2000). Significant progress has also been made in sequencing the 
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genomes of model organism, such as mouse, C. elegans, and Arabadopsis sp. Several databases 
containing genomic information annotated with some functional information are maintained by 
different organization, and are accessible via the internet , for e xample, http://wwwtigr.org/tdb; 
http://www.g e n e tics.wisc. e du; http://genom e www.stanford. e duA ball; http://hiv web.lanl.gov; 
http://www.ncbi.nlm.nih.gov; http://www. e bi.ac.ulc; http://Past e ur.fr/other/biology; and http:// 
www.genome.wi.mit. e du. 

Please replace the paragraph on page 61, lines 5 to 27, with following amended 

paragraph: 

One example of a useful algorithm is BLAST and BLAST 2.0 algorithms, which 
are described in Altschul et al., Nuc. Acids Res. 25:3389-3402, 1977, and Altschul et al, J. Mol, 
Biol. 215:403-410, 1990, respectively. Software for performing BLAST analyses is publicly 
available through the National Center for Biotechnology Information 
[[(http://www.ncbi.nlm.nih.gov/)]]. This algorithm involves first identifying high scoring 
sequence pairs (HSPs) by identifying short words of length W in the query sequence, which 
either match or satisfy some positive-valued threshold score T when aligned with a word of the 
same length in a database sequence. T is referred to as the neighborhood word score threshold 
(Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches 
to find longer HSPs containing them. The word hits are extended in both directions along each 
sequence for as far as the cumulative alignment score can be increased. Cumulative scores are 
calculated using, for nucleotide sequences, the parameters M (reward score for a pair of 
matching residues; always >0). For amino acid sequences, a scoring matrix is used to calculate 
the cumulative score. Extension of the word hits in each direction are halted when: the 
cumulative alignment score falls off by the quantity X from its maximum achieved value; the 
cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring 
residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters 
W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for 
nucleotide sequences) uses as defaults a wordlength (W) of 1 1, an expectation (E) of 10, M=5, 
N—4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses 
as defaults a wordlength of 3, and expectations (E) of 10, and the BLOSUM62 scoring matrix 
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(see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915, 1989) alignments (B) of 50, 
expectation (E) of 10, M=5, N= -4, and a comparison of both strands. 

Please replace the paragraph on page 62, lines 16 to 27, with following amended 

paragraph: 

The BLAST programs identify homologous sequences by identifying similar 
segments, which are referred to herein as "high-scoring segment pairs," between a query amino 
or nucleic acid sequence and a test sequence which is preferably obtained from a protein or 
nucleic acid sequence database. High-scoring segment pairs are preferably identified (i.e., 
aligned) by means of a scoring matrix, many of which are known in the art. Preferably, the 
scoring matrix used is the BLOSUM62 matrix (Gonnet et al., Science 256:1443-1445, 1992; 
Henikoff and Henikoff, Proteins 17:49-61, 1993). Less preferably, the PAM or PAM250 
matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978, Matrices for Detecting 
Distance Relationships: Atlas of Protein Sequence and Structure, Washington: National 
Biomedical Research Foundation). BLAST programs are accessible through the U.S. National 
Library of Medicine[[, e.g., atwww.ncbi.nlm.nih.gov]]. 



